Introduction to Reproducibility and RMarkdown

Shonda Kuiper

January 16, 2019

Growing Demand for Statistics

Growing Demand for Statistics

Growing Demand for Statistics

Growing Demand for Statistics

McKinsey & Company (Manyika et al., 2011) has predicted shortfalls of 150,000 data analysts and 1.5 million managers who are knowledgeable about data and their relevance.

Growing Demand for Statistics

Challenges to the Growing Demand for Statistics

Challenges to the Growing Demand for Statistics

Challenges to the Growing Demand for Statistics

Challenges to the Growing Demand for Statistics

Reproducibility

Reproducibility is the ability for a study or experiment to be duplicated.

Reproducibility: Science Isn’t Broken (fivethirtyeight.com)

"Nosek’s team invited researchers to take part in a crowdsourcing data analysis project. The setup was simple. Participants were all given the same data set and prompt: Do soccer referees give more red cards to dark-skinned players than light-skinned ones? They were then asked to submit their analytical approach for feedback from other teams before diving into the analysis.

Twenty-nine teams with a total of 61 analysts took part. The researchers used a wide variety of methods, ranging - for those of you interested in the methodological gore - from simple linear regression techniques to complex multilevel regressions and Bayesian approaches. They also made different decisions about which secondary variables to use in their analyses."

Reproducibility: Science Isn’t Broken (fivethirtyeight.com)

"Nosek’s team invited researchers to take part in a crowdsourcing data analysis project. The setup was simple. Participants were all given the same data set and prompt: Do soccer referees give more red cards to dark-skinned players than light-skinned ones? They were then asked to submit their analytical approach for feedback from other teams before diving into the analysis.

Twenty-nine teams with a total of 61 analysts took part. The researchers used a wide variety of methods, ranging - for those of you interested in the methodological gore - from simple linear regression techniques to complex multilevel regressions and Bayesian approaches. They also made different decisions about which secondary variables to use in their analyses."

Science Isn’t Broken: https://fivethirtyeight.com/features/science-isnt-broken/#part1

Science Isn’t Broken: https://fivethirtyeight.com/features/science-isnt-broken/#part1

p-Hacking

https://www.youtube.com/watch?v=FLNeWgs2n_Q

p-Hacking, Multiple Comparisons and Data Dredging

Spurrious Correlations: http://tylervigen.com/spurious-correlations

Spurrious Correlations: http://tylervigen.com/spurious-correlations

p-Hacking, Multiple Comparisons and Data Dredging

Spurrious Correlations: http://tylervigen.com/spurious-correlations

Spurrious Correlations: http://tylervigen.com/spurious-correlations

p-Hacking, Multiple Comparisons and Data Dredging

Spurrious Correlations: http://tylervigen.com/spurious-correlations

Spurrious Correlations: http://tylervigen.com/spurious-correlations

p-Hacking, Multiple Comparisons and Data Dredging

p-Hacking, Multiple Comparisons and Data Dredging

Does having democrats in power improve our economy?

p-Hacking, Multiple Comparisons and Data Dredging

Does having democrats in power improve our economy?

p-Hacking, Multiple Comparisons and Data Dredging

Does having democrats in power improve our economy?

p-Hacking, Multiple Comparisons and Data Dredging

p-Hacking, Multiple Comparisons and Data Dredging

How can you use p-values in multiple regression to prove or disprove the relationship between:

p-Hacking, Multiple Comparisons and Data Dredging

How can you use p-values in multiple regression to prove or disprove the relationship between:

With large datasets, we can almost always find “significant results” to support our conclusions.

p-Hacking, Multiple Comparisons and Data Dredging

How can you use p-values in multiple regression to prove or disprove the relationship between:

With large datasets, we can almost always find “significant results” to support our conclusions.

He uses statistics as a drunken man uses lamp posts - for support rather than for illumination. -Andrew Lang

When a measure becomes a target, it is no longer a measure. - Goodhart’s law

p-Hacking, Multiple Comparisons and Data Dredging

Analysis of more than 4,000 studies of neurological diseases suggest that the published work - some of which was used to justify human clinical trials - is biased towards reporting positive results.

{width = 50%}

p-Hacking, Multiple Comparisons and Data Dredging

p-Hacking, Multiple Comparisons and Data Dredging

“We know that as much as 30 percent of the most influential original medical research papers later turn out to be wrong or exaggerated.”

Introduction to R Markdown

Introdution to R Markdown by Andrew Bray

http://prezi.com/dvmgx17e_was/reproducible/?utm_campaign=share&utm_medium=copy

Resources

References